Image ranking based on attribute correlation

ABSTRACT

Images are retrieved and ranked according to relevance to attributes of a multi-attribute query through training image attribute detectors for different attributes annotated in a training dataset. Pair-wise correlations are learned between pairs of the annotated attributes from the training dataset of images. Image datasets may are searched via the trained attribute detectors for images comprising attributes in a multi-attribute query. The retrieved images are ranked as a function of comprising attributes that are not within the query subset plurality of attributes but are paired to one of the query subset plurality of attributes by the pair-wise correlations, wherein the ranking is an order of likelihood that the different ones of the attributes will appear in an image with the paired one of the query subset plurality of attributes.

BACKGROUND

The present invention relates to utilizing computer vision applicationsfor the automated searching of human image data for people as a functionof visual appearance characteristics.

Video, still camera and other image data feeds may be searched to findtargeted objects or individuals. For example, to search for a person,one may provide description information indicating certain personalfacial visual traits to a manager of a video archive (for example,wearing glasses, baseball hat, etc.), wherein the archive may bemanually scanned looking for one or more people with similarcharacteristics. Such a manual search is both time and human resourceconsuming. Moreover, human visual attention may be ineffective,particularly for large volumes of image data. Due to many factors,illustratively including an infrequency of activities of interest, afundamental tedium associated with the task and poor reliability inobject tracking in environments with visual clutter and otherdistractions, human analysis of input information may be both expensiveand ineffective.

Automated input systems and methods are known wherein computers or otherprogrammable devices directly analyze video data and attempt torecognize objects, people, events or activities of concern throughcomputer vision applications. Some existing approaches learn a separateappearance model for each of a plurality of image attributes, forexample for bald, mustache, beard, hat, sunglasses, light skin-tones,etc. When given a multi-attribute query, such systems may add up theconfidence scores for each individual query attribute. Thus, a searchfor a (i) male (ii) wearing glasses and (iii) a beard may retrieve aplurality of results that each have a confidence score meeting all ofthree of the attributes, or that each meet one or more. However, theformer technique may miss results, for example where one of theattributes is indistinct in a given image resulting in its exclusion.The latter may return too many results, including impossibilities orimprobabilities as to meeting all three, such as an image of a personwearing glasses that is a young girl). Thus, the returned results maymiss a target, or return too many hits to be analyzed efficiently.

BRIEF SUMMARY

In one embodiment of the present invention, a method for retrieving andranking multi-attribute query results according to relevance toattributes of a multi-attribute query includes training image attributedetectors for each of different attributes that are annotated in atraining dataset of images of people and learning (via a processor,etc.) pair-wise correlations between each pair of the annotatedattributes from the training dataset of images. Image datasets aresearched via the trained attribute detectors for images comprisingattributes in a multi-attribute query. The retrieved images are rankedas a function of comprising attributes that are not within the querysubset plurality of attributes but are paired to one of the query subsetplurality of attributes by the pair-wise correlations, wherein theranking is an order of likelihood that the different ones of theattributes will appear in an image with the paired one of the querysubset plurality of attributes.

In another embodiment, a system has a processing unit, computer readablememory and a computer readable storage medium device with programinstructions to train image attribute detectors for each of differentattributes that are annotated in a training dataset of images of peopleand learn pair-wise correlations between each pair of the annotatedattributes from the training dataset of images. Image datasets aresearched via the trained attribute detectors for images comprisingattributes in a multi-attribute query. The retrieved images are rankedas a function of comprising attributes that are not within the querysubset plurality of attributes but are paired to one of the query subsetplurality of attributes by the pair-wise correlations, wherein theranking is an order of likelihood that the different ones of theattributes will appear in an image with the paired one of the querysubset plurality of attributes.

In another embodiment, an article of manufacture has a computer readablestorage medium device with computer readable program code embodiedtherewith, the computer readable program code comprising instructionsthat, when executed by a computer processor, cause the computerprocessor to train image attribute detectors for each of differentattributes that are annotated in a training dataset of images of peopleand learn pair-wise correlations between each pair of the annotatedattributes from the training dataset of images. Image datasets aresearched via the trained attribute detectors for images comprisingattributes in a multi-attribute query. The retrieved images are rankedas a function of comprising attributes that are not within the querysubset plurality of attributes but are paired to one of the query subsetplurality of attributes by the pair-wise correlations, wherein theranking is an order of likelihood that the different ones of theattributes will appear in an image with the paired one of the querysubset plurality of attributes.

In another embodiment, a method provides a service for retrieving andranking multi-attribute query results according to relevance toattributes of a multi-attribute query. Computer-readable program code isintegrated into a computer system that includes a processor, a computerreadable memory in circuit communication with the processor, and acomputer readable storage medium in circuit communication with theprocessor. The processor executes program code instructions stored onthe computer-readable storage medium via the computer readable memoryand thereby trains image attribute detectors for each of differentattributes that are annotated in a training dataset of images of people,and learns pair-wise correlations between each pair of the annotatedattributes from the training dataset of images. Image datasets aresearched by the processor via the trained attribute detectors for imagescomprising attributes in a multi-attribute query. The retrieved imagesare ranked by the processor as a function of comprising attributes thatare not within the query subset plurality of attributes but are pairedto one of the query subset plurality of attributes by the pair-wisecorrelations, wherein the ranking is an order of likelihood that thedifferent ones of the attributes will appear in an image with the pairedone of the query subset plurality of attributes.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

These and other features of this invention will be more readilyunderstood from the following detailed description of the variousaspects of the invention taken in conjunction with the accompanyingdrawings in which:

FIG. 1 is a flow chart illustration of an embodiment of a method orsystem for ranking multi-attribute query results according to relevanceto the multi-attribute query according to the present invention.

FIGS. 2A through 2E are diagrammatical illustrations of image fieldconfigurations for extracting feature vectors according to embodimentsof the present invention.

FIG. 3 is a diagrammatical illustration of an exemplary image retrievaland ranking as a function of a multi-attribute query according toembodiments of the present invention.

FIG. 4 is a block diagram illustration of a computerized implementationof an embodiment of the present invention.

FIG. 5 is a block diagram illustration of an apparatus or deviceembodiment of the present invention.

The drawings are not necessarily to scale. The drawings are merelyschematic representations, not intended to portray specific parametersof the invention. The drawings are intended to depict only typicalembodiments of the invention and, therefore, should not be considered aslimiting the scope of the invention. In the drawings, like numberingrepresents like elements.

DETAILED DESCRIPTION

As will be appreciated by one skilled in the art, aspects of the presentinvention may be embodied as a system, method or computer programproduct. Accordingly, aspects of the present invention may take the formof an entirely hardware embodiment, an entirely software embodiment(including firmware, resident software, micro-code, etc.) or anembodiment combining software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module” or “system.”Furthermore, aspects of the present invention may take the form of acomputer program product embodied in one or more computer readablemedium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a computer readable storage medium. A computer readablestorage medium may be, for example, but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or semiconductor system,apparatus, or device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer readablestorage medium would include the following: an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk, arandom access memory (RAM), a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM or Flash memory), an optical fiber,a portable compact disc read-only memory (CD-ROM), an optical storagedevice, a magnetic storage device, or any suitable combination of theforegoing. In the context of this document, a computer readable storagemedium may be any tangible medium that can contain or store a programfor use by or in connection with an instruction execution system,apparatus, or device.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, in abaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including, but not limited to, wireless,wireline, optical fiber cable, RF, etc., or any suitable combination ofthe foregoing.

Computer program code for carrying out operations for aspects of thepresent invention may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++ or the like and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on theuser's computer, partly on the user's computer, as a stand-alonesoftware package, partly on the user's computer and partly on a remotecomputer or entirely on the remote computer or server. In the latterscenario, the remote computer may be connected to the user's computerthrough any type of network, including a local area network (LAN) or awide area network (WAN), or the connection may be made to an externalcomputer (for example, through the Internet using an Internet ServiceProvider).

Aspects of the present invention are described below with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer program instructions. These computer program instructions maybe provided to a processor of a general purpose computer, specialpurpose computer, or other programmable data processing apparatus toproduce a machine, such that the instructions, which execute via theprocessor of the computer or other programmable data processingapparatus, create means for implementing the functions/acts specified inthe flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer,other programmable data processing apparatus, or other devices to causea series of operational steps to be performed on the computer, otherprogrammable apparatus or other devices to produce a computerimplemented process such that the instructions which execute on thecomputer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

Referring now to FIG. 1, a method, system or process for rankingmulti-attribute query results according to relevance to themulti-attribute query is illustrated. A training dataset 102 of imagesof people annotated with various, different attributes (for example,blonde hair, long hair, eyeglasses, baseball hat, earrings, mustache,etc.) is used to train or learn image detectors at 104 and therebygenerate a set of individual detectors 106 for each of the annotatedattributes. At 108 a plurality of pair-wise correlations between eachpair of attributes from the set of attributes is learned from thetraining dataset of images, for example via a processor, programmeddevice, etc.

At 114 an input image dataset 112 is searched via the trained attributedetectors for images comprising at least one or otherwise satisfying themulti-attribute query attributes 110. Examples of the input imagedataset 112 include archived video data stored in a storage means, livevideo images processed in real-time through computer vision processes,still photography or image archives or real-time feeds, and still otherimage datasets 112 may be practiced. At 116 a plurality of images isretrieved from the searching of the image dataset 112 that each compriseat least one of the query attributes while also taking into account(thus, in response to) information from the trained attribute detectorscorresponding to attributes that are not a part of the query but arerelevant to the query attributes as a function of the learned pluralityof pair-wise correlations. Accordingly, at 118 the retrieved results areranked as a function of a total number of their attributes that are alsothe query attributes. For example, the ranking function rank imagescontaining the highest number of attributes in the query at the top, andimages with the next highest number of matching attributes next and soon.

More particularly, feature vector outputs of the individual imagedetectors 106 in response to multi-attribute samples present in thetraining dataset 102 are used to learn a multi-attribute retrieval andranking model through learning pair-wise correlations at 108 of allpairs of the attributes. Thus, embodiments of the present inventionprovide a multi-attribute retrieval and ranking model that retrievespluralities of result images from searching image data that each (i)comprise at least one of the query 110 attributes, and (ii) wherein thelearned pair-wise correlations indicate that the other attributes of thereturned images are also concurrent to the query attributes and/or toother remainder attributes within the complete set of consideredattributes that are not part of the query but are relevant to the queryattributes. The model further ranks or otherwise prioritizes thereturned results as a function of a total number of their attributesthat are also the query attributes, with images with higher numbers ofthe query attributes ranked ahead of those with lower numbers.

The model thus retrieves and ranks target images sought in satisfactionof the query 110 as a function of both total number of relevantattributes and the pair-wise correlations of the query attributes to theremaining attributes. In some embodiments, the attributes may beweighted, and the ranking thus a function of total values of theweighted attributes. For example, where two results have the same numberof matching attributes but with different weightings, the one with moreheavily weighted attributes will be ranked first.

A large variety of features may be extracted for representing eachtraining dataset 102 or image dataset 112 image. Color based featuresinclude color histograms, color corelograms, color wavelets and colormoments. Texture may be encoded using wavelet texture and Local BinaryPatterns (LBP) histograms, while shape information is represented usingedge histograms, shape moments and scale-invariant feature transform(SIFT) based visual words, etc. Referring now to FIGS. 2A through 2E, inone embodiment, feature vectors are extracted from the image field 201of each facial image 203 with respect to five different configurationsand concatenated; a layout configuration of FIG. 2A that extractsfeatures from each of a three-by-three array of grids 202; a centerconfiguration of FIG. 2B that extracts features from only the centergrid 203 (and which thus focuses on facial features 205 of theunderlying facial image 203); a global configuration of FIG. 2C thatextracts features from the whole image field 201 irrespective of thegrids 202; a vertical configuration of FIG. 2D that extracts featuresfrom three vertical columns 204 formed by the grids 202; and ahorizontal configuration of FIG. 2E that extracts features from threehorizontal rows 206 formed by the grids 202. This enables localizationof individual attribute detectors: for example, in one embodiment, theattribute detectors for “hat” or “bald” attributes may give higherweights to features extracted from the topmost row 206 t of grids 202 inthe horizontal configuration of FIG. 2E, and in the three top-most grids202 t 1, 202 t 2 and 202 t 3 in the layout configuration of FIG. 2A.

Training the multi-attribute retrieval and ranking model may beaccomplished through minimizing ranking losses. In some embodiments,training at 104 comprises extracting image features and employingAdaboost, an Adaptive Boosting machine learning algorithm, to learndiscriminative features for each detector attribute. Further, a widevariety of extracted attributes may be used in the training dataset 102to learn or train the detectors at 104 to thereby rank and retrieveimages in the learned models 106 based on semantic attributes. Examplesinclude attributes describing the physical traits of a person includingfacial attributes (e.g. hair color, presence of beard or mustache,presence of eyeglasses or sunglasses etc.), body attributes (e.g. colorof shirt and pants, striped shirt, long/short sleeves etc.), demographicattributes (e.g. age, race, gender) and even non-visual attributes (e.g.voice type, temperature and odor) which could potentially be obtainedfrom other sensors. Moreover, while searching for images of people mayinvolve only a single object class (human faces), embodiments may beutilized for attribute-based retrieval of images containing multipleobject classes (for example, clothing, associated tangible items such asbackpacks or bicycles, etc.). Still other classes and attributes will beapparent to one skilled in the art.

Prior art approaches typically learn a separate appearance model foreach attribute, and when given a multi-attribute query simply add up theconfidence scores of each individual attribute to return results.However, such approaches consider only the attributes that are part ofthe query for retrieving relevant images, and generally fail to considerthe concurrence relationships between these attributes as well asbetween other, different attributes outside of the query. In contrast,embodiments of the present invention also take into account pair-wisecorrelations to those other remainder attributes within the complete setof considered attributes that are not a part of the query but are usefulin ranking the results. For example, an Asian nationality person ishighly unlikely to have blonde hair, but is highly likely to have blackhair, and a female person is highly extremely unlikely to have a beardor a mustache; merely adding confidences of the separate detectors underprior art methods will not reflect these concurrence relationships, andwill thus fail to take into account attributes that are not part of thequery.

Embodiments of the present invention provide frameworks formulti-attribute image retrieval and ranking which retrieve images basednot only on the words that are part of the query 110, but also considersthe remaining attributes within the vocabulary that could potentiallyprovide information about the query. For example, FIG. 3 illustrates oneapplication of a query 110 for “young Asian woman wearing sunglasses.”Images are retrieved and ranked with respect to relevance to attributes302 that are part of the query, and further taking into accountattributes 304 that are not part of the query and inferring throughpair-wise attribute correlations that images are unlikely to be relevant(thus, ranked lower or in some cases eliminated) if they also have amustache 306, beard 308, bald 310 or blonde/light red hair 312attributes, but are more likely to be relevant (thus, ranked higher) ifthey a black hair attribute 314, resulting in prioritized and rankedimage results 320.

Pair-wise correlation co-occurrences may vary in ranking effect. Forexample, for a query containing the attribute “young,” picturescontaining people with gray hair can be discarded as gray hair usuallyoccurs only in older people and a person with grey hair is unlikely tobe “young”; thus, such image results may be filtered out or otherwiseremoved from the results retrieved at 116 (FIG. 1) and/or ranked at 118(FIG. 1) as a function of said specific pair-wise concurrence.Similarly, images containing bald people or persons with mustaches andbeards, which are male specific attributes, may be discarded ordiscounted very heavily (and thus ranked more lowly) during retrieval116 and/or ranking at 118 when one of the constituent attributes of thequery is “woman.” While an individual detector for the attribute “woman”may implicitly learn such features, experiments have found that whensearching for images based on queries containing fine-grained parts andattributes, explicitly modeling the correlations and relationshipsbetween attributes can lead to substantially better results.

Ranking based on a single attribute can sometimes seem unnecessary: forexample, for a single-attribute “beard” query, one can simply classifyimages into people with beards and people without beards.Multi-attribute queries, however, depending on the application, may havemultiple levels of relevance for retrieval and ranking. For example,with respect to a “man wearing a red shirt and sunglasses” query, sincesunglasses can be easily removed, it is reasonable to assume that imagescontaining men wearing a red shirt but without sunglasses are alsorelevant to the query, and thus embodiments of the present inventionmight not remove such images but merely rank them lower as less relevantthan images of men with both a red shirt and sunglasses. In anotherexample, for two images that each have two of the query attributes, animage of a woman with a red shirt and sunglasses may be ranked lowerthan an image of a man in the red shirt with no sunglasses as a functionof the learned pair-wise correlations, in one aspect, as sunglasses canbe easily removed while the gender of a person cannot be easily changed.Traditionally, ranking is treated as a distinct problem withininformation retrieval. However, embodiments of the present inventionintegrate the ranking into the retrieval process in the same structuredlearning framework, where learning to rank and retrieve are simplyoptimizations of the same model according to different performancemeasures.

Supporting image retrieval and ranking based on multi-label queries isnon-trivial, as the number of possible multi-label queries for avocabulary of size L is 2^(L). Most prior art image ranking/retrievalapproaches deal with this problem by learning separate classifiers foreach individual label, and retrieving multi-label queries byheuristically combining the outputs of the individual labels. Incontrast, embodiments of the present invention introduce a principledframework 106 for training and retrieval of multi-label queries, whereinattributes within a single object category and even across multipleobject categories are interdependent so that modeling the correlationsbetween them leads to significant performance gains in retrieval andranking.

Some embodiments of the present invention use structured support vectormachines (SVM) to address prediction problems involving complex outputs.Structured SVMs provide efficient solutions for structured outputproblems, while also modeling the interdependencies that are oftenpresent in the output spaces of such problems. They may be effectivelyused for object localization and modeling the concurrence relationshipsbetween attributes, posing a single learned framework at 108 for rankingand retrieval while also modeling the correlations between theattributes.

Embodiments of the present invention provide image retrieval and rankingbased on the concept of reverse learning. Thus, given a set of labels{X} and a set of training images {Y}, a mapping is learned correspondingto each label {x_(i)} within the set of labels to predict a set ofimages {y*} that contain said label. Since reverse learning has astructured output (a set of images), it fits well into the structuredprediction framework, and allows for learning based on the minimizationof loss functions corresponding to a wide variety of performancemeasures: examples include hamming loss, precision and recall, otherperformance measures may also be practiced in embodiments of the presentinvention. The present approach improves upon the reverse learningapproach in three different ways. First, a single framework provided forboth retrieval and ranking. This is accomplished by adopting a rankingapproach where the output is a set of images ordered by relevance,enabling integration of ranking and reverse learning within the sameframework. Secondly, training is facilitated, as well as retrieval andranking, based on queries consisting of multiple-labels. Finally,pair-wise correlations between different labels (attributes) are modeledand learned and exploited for retrieval and ranking.

Retrieval.

Given a set of labels in a multi-attribute query {Q}, which is a subsetof the set of all possible attribute labels {X}, embodiments of thepresent invention retrieve images from an input source of images (forexample, a source video, database, etc.) as the set of training images{Y} that are relevant to the multi-attribute query label set {Q}. Undera reverse learning formulation, for an input an output may be determinedfor the set of images {y*} that contains all the constituent attributes{Q} through a prediction function that maximizes a score over the weightvector {w} according to equation (1):

$\begin{matrix}{y^{*} = {\arg {\max\limits_{y \Subset }{w^{T}{\psi \left( {,y} \right)}}}}} & (1)\end{matrix}$

where the weight vector {w} is composed of two components; {w^(a)} formodeling the appearance of individual attributes and {w^(p)} formodeling the dependencies between them. Components of equation (1) maybe define as follows:

$\begin{matrix}{{{w^{T}{\psi \left( {,y} \right)}} = {{\sum\limits_{x_{i} \in }{w_{i}^{a}{\Phi_{a}\left( {x_{i},y} \right)}}} + {\sum\limits_{x_{i} \in }{\sum\limits_{x_{j} \in }{w_{ij}^{p}{\Phi_{p}\left( {x_{j},y} \right)}}}}}}{where}} & (2) \\{{\Phi_{a}\left( {x_{i},y} \right)} = {\sum\limits_{y_{k} \in y}{\varphi_{a}\left( {x_{i},y_{k}} \right)}}} & (3) \\{{\Phi_{p}\left( {x_{j},y} \right)} = {\sum\limits_{y_{k} \in y}{\varphi_{p}\left( {x_{j},y_{k}} \right)}}} & (4)\end{matrix}$

More particularly, equation (3) defines a feature vector {φ_(a)(x_(i),y_(k))} representing image y_(k) for attribute x_(i). Equation (4)defines a vector {φ_(p)(x_(j),y_(k))} indicating the presence ofattribute x_(j) in image y_(k), which is not known during a test phaseand hence φ_(p)(x_(j),y_(k)) may be treated as a latent variable, or setto be the output of an independently trained attribute detector. Inequation (2), w_(i) ^(a) is a standard linear model for recognizingattribute x, based on the feature representation and we is a potentialfunction encoding the correlation between the pair of attributes x_(i)and x_(j). By substituting (3) into the first part of (2), one canintuitively see that this represents the summation of the confidencescores of all the individual attributes x, in the query Q, over all asubset of the images y_(k). Similarly, the second (pair-wise) term in(2) represents the correlations between the query attributes x_(i) andthe entire set of attributes X, over images in the set y. Hence, thepair-wise term ensures that information from attributes that are notpresent in the query attribute set Q are also utilized for retrievingthe relevant images.

Thus, given a set of multi-label training images and their respectivelabels, embodiments of the present invention train a model for theweight vector {w} which given a multi-label sub-set query {Q} cancorrectly predict the subset of images {y*} in a test set {Y_(t)} whichcontain all the labels x_(i). In general, training includes all queries(containing a single attribute as well as multiple attributes) thatoccur in the training set. During the training phase, embodiments of thepresent invention learn the weight vector {w} such that, for each query{Q} the desired output set of retrieved images {y*} has a higher score(equation (1)) than any other set {y}. This can be performed using astandard max-margin training formulation:

$\begin{matrix}\begin{matrix}{\arg \min\limits_{w,\xi}} & {{w^{T}w} + {C{\sum\limits_{l}\xi_{l}}}} \\{\forall t} & {{{w^{T}{\psi \left( {_{t},y_{t}^{*}} \right)}} - {w^{T}{\psi \left( {_{t},y_{t}} \right)}}} \geq {{\Delta \left( {y_{t}^{*},y_{t}} \right)} - \xi_{t}}}\end{matrix} & (5)\end{matrix}$

where C is a parameter controlling the trade-off between the trainingerror and regularization, {Q_(t)} is a set of training queries {ξ_(t)}is the slack variable corresponding to {Q_(t)}, and {Δ(y*_(t),y_(t))} isa loss function. Unlike standard SVMs which use a simple 0/1 loss,embodiments of the present invention may thus employ a complex lossfunction, which enables heavily (or gently) penalizing outputs {y_(t)}that deviate significantly (or slightly) from the correct output{y*_(t)} measured based on an optimizing performance metric. Forexample, {Δ(y*_(t),y_(t))} may be defined for optimizing training errorbased on different performance metrics as follows:

$\begin{matrix}{{\Delta \left( {y_{t}^{*},y_{t}} \right)} = \left\{ \begin{matrix}{1 - \frac{{y_{t}\bigcap y_{t}^{*}}}{y_{t}}} & {precision} \\{1 - \frac{{y_{t}\bigcap y_{t}^{*}}}{y_{t}^{*}}} & {recall} \\{1 - \frac{{{y_{t}\bigcap y_{t}^{*}}} + {{{\overset{\dddot{}}{y}}_{t}\bigcap{\overset{\dddot{}}{y}}_{t}^{*}}}}{}} & {{hamming}\mspace{14mu} {loss}}\end{matrix} \right.} & (6)\end{matrix}$

Similarly, one can optimize for other performance measures such asF/Beta, an F-score (or F-measure) performance metric having anon-negative real “beta” weight. In one aspect, the reverse learningapproach according to embodiments of the present invention allows one totrain a model optimizing for a variety of performance measures.

The quadratic optimization problem in Equation (5) contains {O(|Q|2^(|)

^(|))} constraints, which is exponential in the number of traininginstances. Hence, embodiments of the present invention may adopt aconstraint generation strategy comprising an iterative procedure thatinvolves solving Equation (5), initially without any constraints, andthen at each iteration adding the most violated constraint of thecurrent solution to the set of constraints. At each iteration of theconstraint generation process, the most violated constraint is given by:

$\begin{matrix}{\xi_{t} \geq {\max\limits_{y_{t} \Subset }\left\lbrack {{\Delta \left( {y_{t}^{*},y_{t}} \right)} - \left( {{w^{T}{\psi \left( {_{t},y_{t}^{*}} \right)}} - {w^{T}{\psi \left( {_{t},y_{t}} \right)}}} \right)} \right\rbrack}} & (7)\end{matrix}$

Equation (7) can be solved in O(|

|²) time. During prediction, embodiments solve for (1), which can beefficiently performed in O(|

|log(|

|)).

Ranking.

With minor modifications the framework for image retrieval is alsoutilized for ranking multi-label queries. In the case of image ranking,given the multi-attribute query {Q}, the images in the set of images {Y}may be ranked according to their relevance to the {Q} attributes. Unlikeimage retrieval, where given an input {Q} the output is a subset of testimages, in the case of ranking the output of a prediction function is apermutation {z*} of the set of images defined by equation (8):

$\begin{matrix}{z^{*} = {\arg {\max\limits_{z \in {\pi {()}}}{w^{T}{\psi \left( {,z} \right)}}}}} & (8)\end{matrix}$

where λ(

) is the set of all possible permutations of the set of images

. Thus, the weight vector {w} may be used to rank through equation (9):

$\begin{matrix}{{{w^{T}{\psi \left( {,z} \right)}} = {{\sum\limits_{x_{i} \in }{w_{i}^{a}{{\hat{\Phi}}_{a}\left( {x_{i},z} \right)}}} + {\sum\limits_{x_{i} \in }{\sum\limits_{x_{j} \in }{w_{ij}^{p}{{\hat{\Phi}}_{p}\left( {x_{j},z} \right)}}}}}}{where}} & (9) \\{{{\hat{\Phi}}_{a}\left( {x_{i},z} \right)} = {\sum\limits_{z_{k} \in z}{{A\left( {r\left( z_{k} \right)} \right)}{\varphi_{a}\left( {x_{i},z_{k}} \right)}}}} & (10) \\{{{\hat{\Phi}}_{p}\left( {x_{j},z} \right)} = {\sum\limits_{z_{k} \in z}{{A\left( {r\left( z_{k} \right)} \right)}{\varphi_{p}\left( {x_{j},z_{k}} \right)}}}} & (11)\end{matrix}$

A(r) is any non-increasing function, and {r(z_(k))} is the rank of imagez_(k). Further, retrieved results ranked may be limited to a top set {K}by defining A(r) as:

A(r)=max(K+1−r,0)  (12)

In one aspect, this may ensure that the lower (top) ranked images areassigned higher weights, and since the equation equals zero for resultsgreater than K, only the top K images of the ranking are considered.

In contrast to prior art ranking methods which simply divide a set oftraining images into two sets (relevant and irrelevant) corresponding toeach query and just learn a binary ranking, embodiments of the presentinvention utilize multiple levels of relevance. For example, given aquery {Q} training images may be divided into {|Q|+1} sets based ontheir relevance. Thus, the most relevant set comprising images thatcontain all the attributes in the query |Q| may be assigned a relevance{rel(j)=|Q|}, the next set consists of images containing any {|Q|−1} ofthe attributes which are assigned a relevance {rel(j)=|Q|−1} and so on,with a last set consisting of images with none of the attributes presentin the query assigned relevance rel(j)=0. This ensures that, in casethere are no images containing all the query attributes, images thatcontain the most number of attributes are ranked highest. Further, whileequal weights may be assigned to all attributes, in some embodiments theattributes may be weighted. For example, in one embodiment, higherweights are assigned to attributes involving race or gender which aremore difficult to modify, resulting in higher rankings relative tolower-weighted apparel attributes that can be easily changed (e.g.wearing sunglasses, red shirt, etc.).

Thus, embodiments of the present invention may train a ranking modelusing a max-margin framework pursuant to equation (13):

$\begin{matrix}\begin{matrix}{\arg \min\limits_{w,\xi}} & {{w^{T}w} + {C{\sum\limits_{l}\xi_{l}}}} \\{\forall t} & {{{w^{T}{\psi \left( {_{t},z_{t}^{*}} \right)}} - {w^{T}{\psi \left( {_{t},z_{t}} \right)}}} \geq {{\Delta \left( {z_{t}^{*},z_{t}} \right)} - \xi_{t}}}\end{matrix} & (13)\end{matrix}$

where {Δ(z, *z)} is a function denoting the loss incurred in predictinga permutation {z} instead of a correct permutation {z*}, and may bedefined as {Δ(z,*z)=1−NDCG@100(z,*z)} wherein (NDCG) is a normalizeddiscount cumulative gain score, a standard measure used for evaluatingranking algorithms that may be defined by equation (14):

$\begin{matrix}{{N\; D\; C\; {G@k}} = {\frac{1}{Z}{\sum\limits_{j = 1}^{k}\frac{2^{{rel}{(j)}} - 1}{\log \left( {1 + j} \right)}}}} & (14)\end{matrix}$

where rel(j) is the relevance of the j^(th) ranked image and Z is anormalization constant to ensure that the correct ranking results in anNDCG score of 1. Since NDCG@100 takes into account only the top 100ranked images, we may set K=100 in Equation (12).

In the case of ranking, the max-margin problem (Equation 13) againcontains an exponential number of constraints and we adopt theconstraint generation procedure, where the most violated constraint isiteratively added to the optimization problem. The most violatedconstraint is given by:

$\begin{matrix}{\xi_{t} \geq {\max\limits_{z_{t} \in {\pi {()}}}\left\lbrack {{\Delta \left( {z_{t}^{*},z_{t}} \right)} - \left( {{w^{T}{\psi \left( {_{t},z_{t}^{*}} \right)}} - {w^{T}{\psi \left( {_{t},z_{t}} \right)}}} \right)} \right\rbrack}} & (15)\end{matrix}$

which, after omitting terms independent of z_(t) and substitutingEquations (9), (10), (14) can be rewritten as:

$\begin{matrix}{{{\arg {\max\limits_{z_{t} \in {\pi {()}}}{\sum\limits_{k = 1}^{100}{{A\left( z_{k} \right)}{W\left( z_{k} \right)}}}}} - {\sum\limits_{k = 1}^{100}\frac{2^{{rel}{(z_{k})}} - 1}{\log \left( {1 + k} \right)}}}{{where}\text{:}}} & (16) \\{{W\left( z_{k} \right)} = {{\sum\limits_{x_{i} \in _{t}}{w_{i}^{a}{\varphi_{a}\left( {x_{i},z_{k}} \right)}}} + {\sum\limits_{x_{j} \in _{t}}{\sum\limits_{x_{j} \in }{w_{ij}^{p}{\varphi_{p}\left( {x_{j},z_{k}} \right)}}}}}} & (17)\end{matrix}$

Equation (16) is a linear assignment problem in z_(k) and can beefficiently solved using a Kuhn-Munkres algorithm. During prediction,Equation (8) needs to be solved, which can be rewritten as:

$\begin{matrix}{\arg {\max\limits_{z \in {\pi {()}}}{\sum\limits_{k}{{A\left( {r\left( z_{k} \right)} \right)}{W\left( z_{k} \right)}}}}} & (18)\end{matrix}$

Since A(z_(j)) is a non-increasing function, ranking can be performed bysimply sorting the samples according to the values of W(z_(k)).

Referring now to FIG. 4, an exemplary computerized implementation of anembodiment of the present invention includes computer or otherprogrammable device 522 in communication with other devices 506 (forexample, a video or still image camera or server, or a memory devicecomprising a database of images, etc.) that retrieves and ranksmulti-attribute query results according to relevance to themulti-attribute query as described above with respect to FIGS. 1 through3. For example, in response to implementing computer readable codeinstructions 542 residing in a computer memory 516 or a storage system532 or in another device 506 accessed through a computer networkinfrastructure 526, the processor (CPU) 538 may provide retrieve andrank multi-attribute query results according to relevance to themulti-attribute query as described above with respect to FIGS. 1 through3.

FIG. 5 illustrates an apparatus or device embodiment 402 of the presentinvention that ranks multi-attribute query results according torelevance to the multi-attribute query. More particularly, an ImageAttribute Detector Trainer and Attribute Mapper 404 trains or learnsimage detectors and thereby generates the set of individual detectors106 (FIG. 1) for each of the annotated attributes in the trainingdataset 102 (FIG. 1) and further learns pair-wise correlations betweenthe attributes from the training dataset. Thus, a Multi-AttributeRetrieval and Ranking Model 408 searches via the trained attributedetectors 106 the image dataset 112 for images comprising at least oneof a multi-attribute query 110 subset plurality of the annotatedattributes; retrieves images that each comprise at least one of thequery subset plurality of attributes and in response to information fromthe trained attribute detectors corresponding to attributes that are nota part of the query but are relevant to the query attributes as afunction of the learned plurality of pair-wise correlations; and ranksthe retrieved plurality of images as a function of respective totalnumbers of their attributes that are also within the query subsetplurality of attributes.

Embodiments may also perform processes or provides embodiments of thepresent invention on a subscription, advertising, and/or fee basis. Thatis, a service provider could offer to provide automated retrieval andranking of multi-attribute query results according to relevance to themulti-attribute query as described above with respect to FIGS. 1 through5. Thus, the service provider can create, maintain, and support, etc., acomputer infrastructure, such as the network computer system 522 and/ornetwork environment 526 that performs the process steps of the inventionfor one or more customers. In return, the service provider can receivepayment from the customer(s) under a subscription and/or fee agreementand/or the service provider can receive payment from the sale ofadvertising content to one or more third parties.

In still another embodiment, the invention provides acomputer-implemented method for executing one or more of the processes,systems and articles for providing automated searches of images andrankings with respect to query attributes as described above withrespect to FIGS. 1-4. In this case, a computer infrastructure, such asthe computer 522 or network infrastructure 526, or the apparatus ordevice embodiment 402, can be provided and one or more systems forperforming the process steps of the invention can be obtained (e.g.,created, purchased, used, modified, etc.), for example deployed to thecomputer 522 infrastructure or the apparatus or device embodiment 402.To this extent, the deployment of a system or device can comprise one ormore of: (1) installing program code on a computing device, such as thecomputers/devices 522, from a computer-readable medium device 516, 520or 506, the Image Attribute Detector Trainer 404 or the Multi-AttributeRetrieval and Ranking Model Trainer 408; (2) adding one or morecomputing devices to the computer infrastructure 522 or the apparatus ordevice embodiment 402; and (3) incorporating and/or modifying one ormore existing systems of the computer infrastructure or device to enablethe computer infrastructure or device to perform the process steps ofthe invention.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the invention. Asused herein, the singular forms “a”, “an” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising,” when used in this specification, specify thepresence of stated features, integers, steps, operations, elements,and/or components, but do not preclude the presence or addition of oneor more other features, integers, steps, operations, elements,components, and/or groups thereof. Certain examples and elementsdescribed in the present specification, including in the claims and asillustrated in the Figures, may be distinguished or otherwise identifiedfrom others by unique adjectives (e.g. a “first” element distinguishedfrom another “second” or “third” of a plurality of elements, a “primary”distinguished from a “secondary” one or “another” item, etc.) Suchidentifying adjectives are generally used to reduce confusion oruncertainty, and are not to be construed to limit the claims to anyspecific illustrated element or embodiment, or to imply any precedence,ordering or ranking of any claim elements, limitations or process steps.

The corresponding structures, materials, acts, and equivalents of allmeans or step plus function elements in the claims below are intended toinclude any structure, material, or act for performing the function incombination with other claimed elements as specifically claimed. Thedescription of the present invention has been presented for purposes ofillustration and description, but is not intended to be exhaustive orlimited to the invention in the form disclosed. Many modifications andvariations will be apparent to those of ordinary skill in the artwithout departing from the scope and spirit of the invention. Theembodiment was chosen and described in order to best explain theprinciples of the invention and the practical application, and to enableothers of ordinary skill in the art to understand the invention forvarious embodiments with various modifications as are suited to theparticular use contemplated.

What is claimed is:
 1. A method for retrieving and rankingmulti-attribute query results according to relevance to attributes of amulti-attribute query, the method comprising: training each of aplurality of image attribute detectors for one each of a plurality ofdifferent attributes that are annotated in a training dataset of imagesof people; learning via a processor a plurality of pair-wisecorrelations between each pair of the plurality of annotated attributesfrom the training dataset of images; searching via the trained attributedetectors an input image dataset for images comprising at least one of amulti-attribute query subset plurality of the annotated attributes;retrieving a plurality of images from the searching of the input imagedataset that each comprise at least one of the query subset plurality ofattributes; and ranking the retrieved plurality of images as a functionof comprising different ones of the attributes that are not within thequery subset plurality of attributes but are paired to the at least oneof the query subset plurality of attributes by the pair-wisecorrelations, wherein the ranking is an order of likelihood that thedifferent ones of the attributes will appear in an image with the pairedat least one of the query subset plurality of attributes.
 2. The methodof claim 1, further comprising: integrating computer-readable programcode into a computer system comprising a processor, a computer readablememory in circuit communication with the processor, and a computerreadable storage medium in circuit communication with the processor; andwherein the processor executes program code instructions stored on thecomputer-readable storage medium via the computer readable memory andthereby performs the steps of training each of the plurality of imageattribute detectors, learning the plurality of pair-wise correlationsbetween each pair of the plurality of annotated attributes from thetraining dataset of images, searching via the trained attributedetectors the input image dataset for images comprising at least one ofa multi-attribute query subset plurality of the annotated attributes,retrieving the plurality of images from the searching of the input imagedataset that each comprise at least one of the query subset plurality ofattributes, and ranking the retrieved plurality of images as a functionof comprising different ones of the attributes that are not within thequery subset plurality of attributes but are paired to the at least oneof the query subset plurality of attributes by the pair-wisecorrelations.
 3. The method of claim 1, wherein a first of the annotatedattributes is more heavily weighted than a second of the annotatedattributes; and wherein the ranking the retrieved plurality of imagesfurther comprises ranking a one of the results with the more heavilyweighted first attribute higher than another of the results that has thesecond attribute and a same total number of the attributes that are alsowithin the query subset plurality of attributes.
 4. The method of claim3, wherein the learning the plurality of pair-wise correlations betweeneach pair of the plurality of annotated attributes further comprises:reverse learning a mapping of a set of labels of the annotatedattributes to the images in the training dataset of images to predictrespective sets of the training dataset images that each contain one ofthe annotated attribute labels.
 5. The method of claim 4, wherein theretrieving the plurality of images from the searching of the input imagedataset that each comprise at least one of the query subset plurality ofattributes further comprises: predicting the retrieved plurality ofimages by maximizing weighted feature vectors extracted by each of thetrained image attribute detectors as a function of a component modelingan appearance of the attribute of the each of the trained imageattribute detectors, and a component modeling a dependency between theattribute of the each of the attributes of the trained image attributedetectors to another one of the annotated attributes in the trainingdataset of images that is not in the query subset plurality ofattributes.
 6. The method of claim 5, wherein the learning the pair-wisecorrelations is a max-margin training.
 7. The method of claim 6, whereinthe predicting the retrieved plurality of images by maximizing theweighted feature vectors extracted by each of the trained imageattribute detectors further comprises: employing a complex loss functionto more heavily penalize one of the weighted feature vector outputs thatdeviates more from a correct output measured based on an optimizedperformance metric than another of the weighted feature vector outputshaving a lesser deviation from the correct output measured based on theoptimized performance metric.
 8. The method of claim 7, wherein themax-margin training further comprises: generating a plurality ofconstraints; and iteratively adding most violated constraints of thegenerating a plurality of constraints to the optimized performancemetric.
 9. A system, comprising: a processing unit, a computer readablememory and a computer readable storage medium; first programinstructions to train each of a plurality of image attribute detectorsfor one each of a plurality of different attributes that are annotatedin a training dataset of images of people; second program instructionsto learn a plurality of pair-wise correlations between each pair of theplurality of annotated attributes from the training dataset of images;third program instructions to search via the trained attribute detectorsan input image dataset for images comprising at least one of amulti-attribute query subset plurality of the annotated attributes; andfourth program instructions to retrieve a plurality of images from thesearching of the input image dataset that each comprise at least one ofthe query subset plurality of attributes, the fourth programinstructions further to rank the retrieved plurality of images as afunction of comprising different ones of the attributes that are notwithin the query subset plurality of attributes but are paired to the atleast one of the query subset plurality of attributes by the pair-wisecorrelations, wherein the ranking is an order of likelihood that thedifferent ones of the attributes will appear in an image with the pairedat least one of the query subset plurality of attributes; and whereinthe first, second, third and fourth program instructions are stored onthe computer readable storage medium for execution by the processingunit via the computer readable memory.
 10. The system of claim 9,wherein a first of the annotated attributes is more heavily weightedthan a second of the annotated attributes; and wherein the fourthprogram instructions are further to rank the retrieved plurality ofimages by ranking a one of the results with the more heavily weightedfirst attribute higher than another of the results that has the secondattribute and a same total number of the attributes that are also withinthe query subset plurality of attributes.
 11. The system of claim 10,wherein the second program instructions are further to learn theplurality of pair-wise correlations between each pair of the pluralityof annotated attributes by reverse learning a mapping of a set of labelsof the annotated attributes to the images in the training dataset ofimages to predict respective sets of the training dataset images thateach contain one of the annotated attribute labels.
 12. The system ofclaim 11, wherein the fourth program instructions are further toretrieve the plurality of images from the searching of the input imagedataset that each comprise at least one of the query subset plurality ofattributes by: predicting the retrieved plurality of images bymaximizing weighted feature vectors extracted by each of the trainedimage attribute detectors as a function of a component modeling anappearance of the attribute of the each of the trained image attributedetectors, and a component modeling a dependency between the attributeof the each of the attributes of the trained image attribute detectorsto another one of the annotated attributes in the training dataset ofimages that is not in the query subset plurality of attributes.
 13. Thesystem of claim 12, wherein the learning the pair-wise correlations is amax-margin training.
 14. The system of claim 13, wherein the fourthprogram instructions are further to predict the retrieved set of imagesby maximizing the weighted feature vectors extracted by each of thetrained image attribute detectors by employing a complex loss functionto more heavily penalize one of the weighted feature vector outputs thatdeviates more from a correct output measured based on an optimizedperformance metric than another of the weighted feature vector outputshaving a lesser deviation from the correct output measured based on theoptimized performance metric; and wherein the second programinstructions are further to learn the pair-wise correlations through themax-margin training by generating a plurality of constraints anditeratively adding most violated constraints of the generating aplurality of constraints to the optimized performance metric.
 15. Anarticle of manufacture, comprising: a computer readable hardware storagedevice having computer readable program code embodied therewith, thecomputer readable program code comprising instructions for execution bya processor that cause the processor to: train each of a plurality ofimage attribute detectors for one each of a plurality of differentattributes that are annotated in a training dataset of images of people;learn a plurality of pair-wise correlations between each pair of theplurality of annotated attributes from the training dataset of images;search via the trained attribute detectors an input image dataset forimages comprising at least one of a multi-attribute query subsetplurality of the annotated attributes; retrieve a plurality of imagesfrom the searching of the input image dataset that each comprise atleast one of the query subset plurality of attributes; and rank theretrieved plurality of images as a function of comprising different onesof the attributes that are not within the query subset plurality ofattributes but are paired to the at least one of the query subsetplurality of attributes by the pair-wise correlations, wherein theranking is an order of likelihood that the different ones of theattributes will appear in an image with the paired at least one of thequery subset plurality of attributes.
 16. The article of manufacture ofclaim 15, wherein a first of the annotated attributes is more heavilyweighted than a second of the annotated attributes; and wherein thecomputer readable program code instructions for execution by theprocessor further cause the processor to rank the retrieved plurality ofimages by ranking a one of the results with the more heavily weightedfirst attribute higher than another of the results that has the secondattribute and a same total number of the attributes that are also withinthe query subset plurality of attributes.
 17. The article of manufactureof claim 16, wherein the computer readable program code instructions forexecution by the processor further cause the processor to learn theplurality of pair-wise correlations between each pair of the pluralityof annotated attributes by reverse learning a mapping of a set of labelsof the annotated attributes to the images in the training dataset ofimages to predict respective sets of the training dataset images thateach contain one of the annotated attribute labels.
 18. The article ofmanufacture of claim 17, wherein the computer readable program codeinstructions for execution by the processor further cause the processorto retrieve the plurality of images from the searching of the inputimage dataset that each comprise at least one of the query subsetplurality of attributes by: predicting the retrieved plurality of imagesby maximizing weighted feature vectors extracted by each of the trainedimage attribute detectors as a function of a component modeling anappearance of the attribute of the each of the trained image attributedetectors, and a component modeling a dependency between the attributeof the each of the attributes of the trained image attribute detectorsto another one of the annotated attributes in the training dataset ofimages that is not in the query subset plurality of attributes.
 19. Thearticle of manufacture of claim 18, wherein the learning the pair-wisecorrelations is a max-margin training.
 20. The article of manufacture ofclaim 19, wherein the computer readable program code instructions forexecution by the processor further cause the processor to: predict theretrieved set of images by maximizing the weighted feature vectorsextracted by each of the trained image attribute detectors by employinga complex loss function to more heavily penalize one of the weightedfeature vector outputs that deviates more from a correct output measuredbased on an optimized performance metric than another of the weightedfeature vector outputs having a lesser deviation from the correct outputmeasured based on the optimized performance metric; and learn thepair-wise correlations through the max-margin training by generating aplurality of constraints, and iteratively adding most violatedconstraints of the generating a plurality of constraints to theoptimized performance metric.